51 research outputs found

    Two new ArrayTrack libraries for personalized biomedical research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in high-throughput genotyping technology are paving the way for research in personalized medicine and nutrition. However, most of the genetic markers identified from association studies account for a small contribution to the total risk/benefit of the studied phenotypic trait. Testing whether the candidate genes identified by association studies are causal is critically important to the development of personalized medicine and nutrition. An efficient data mining strategy and a set of sophisticated tools are necessary to help better understand and utilize the findings from genetic association studies. </p> <p>Description</p> <p>SNP (single nucleotide polymorphism) and QTL (quantitative trait locus) libraries were constructed and incorporated into ArrayTrack, with user-friendly interfaces and powerful search features. Data from several public repositories were collected in the SNP and QTL libraries and connected to other domain libraries (genes, proteins, metabolites, and pathways) in ArrayTrack. Linking the data sets within ArrayTrack allows searching of SNP and QTL data as well as their relationships to other biological molecules. The SNP library includes approximately 15 million human SNPs and their annotations, while the QTL library contains publically available QTLs identified in mouse, rat, and human. The QTL library was developed for finding the overlap between the map position of a candidate or metabolic gene and QTLs from these species. Two use cases were included to demonstrate the utility of these tools. The SNP and QTL libraries are freely available to the public through ArrayTrack at <url>http://www.fda.gov/ArrayTrack</url>. </p> <p>Conclusions</p> <p>These libraries developed in ArrayTrack contain comprehensive information on SNPs and QTLs and are further cross-linked to other libraries. Connecting domain specific knowledge is a cornerstone of systems biology strategies and allows for a better understanding of the genetic and biological context of the findings from genetic association studies. </p

    Polymerization and nucleic acid-binding properties of human L1 ORF1 protein

    Get PDF
    The L1 (LINE 1) retrotransposable element encodes two proteins, ORF1p and ORF2p. ORF2p is the L1 replicase, but the role of ORF1p is unknown. Mouse ORF1p, a coiled-coil-mediated trimer of ∼42-kDa monomers, binds nucleic acids and has nucleic acid chaperone activity. We purified human L1 ORF1p expressed in insect cells and made two findings that significantly advance our knowledge of the protein. First, in the absence of nucleic acids, the protein polymerizes under the very conditions (0.05 M NaCl) that are optimal for high (∼1 nM)-affinity nucleic acid binding. The non-coiled-coil C-terminal half mediates formation of the polymer, an active conformer that is instantly resolved to trimers, or multimers thereof, by nucleic acid. Second, the protein has a biphasic effect on mismatched double-stranded DNA, a proxy chaperone substrate. It protects the duplex from dissociation at 37°C before eventually melting it when largely polymeric. Therefore, polymerization of ORF1p seemingly affects its interaction with nucleic acids. Additionally, polymerization of ORF1p at its translation site could explain the heretofore-inexplicable phenomenon of cis preference—the favored retrotransposition of the actively translated L1 transcript, which is essential for L1 survival

    ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers.</p> <p>Results</p> <p>We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen <it>Leishmania braziliensis</it>, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an <it>Escheria coli</it>. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis.</p> <p>Conclusion</p> <p>The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a <it>L. braziliensis </it>GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the <it>E. coli </it>K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at <url>http://bioinfo.pdtis.fiocruz.br/ReRep/</url>.</p

    Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

    Get PDF
    , where a whole-genome BAC library allows targeted access to large genomic regions. genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes)., both involved in multiple developmental processes including wing pattern formation

    Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    Get PDF
    BACKGROUND: Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. RESULTS: 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. CONCLUSION: The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells

    Somatic sex-specific transcriptome differences in Drosophila revealed by whole transcriptome sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding animal development and physiology at a molecular-biological level has been advanced by the ability to determine at high resolution the repertoire of mRNA molecules by whole transcriptome resequencing. This includes the ability to detect and quantify rare abundance transcripts and isoform-specific mRNA variants produced from a gene.</p> <p>The sex hierarchy consists of a pre-mRNA splicing cascade that directs the production of sex-specific transcription factors that specify nearly all sexual dimorphism. We have used deep RNA sequencing to gain insight into how the Drosophila sex hierarchy generates somatic sex differences, by examining gene and transcript isoform expression differences between the sexes in adult head tissues.</p> <p>Results</p> <p>Here we find 1,381 genes that differ in overall expression levels and 1,370 isoform-specific transcripts that differ between males and females. Additionally, we find 512 genes not regulated downstream of <it>transformer </it>that are significantly more highly expressed in males than females. These 512 genes are enriched on the × chromosome and reside adjacent to dosage compensation complex entry sites, which taken together suggests that their residence on the × chromosome might be sufficient to confer male-biased expression. There are no transcription unit structural features, from a set of features, that are robustly significantly different in the genes with significant sex differences in the ratio of isoform-specific transcripts, as compared to random isoform-specific transcripts, suggesting that there is no single molecular mechanism that generates isoform-specific transcript differences between the sexes, even though the sex hierarchy is known to include three pre-mRNA splicing factors.</p> <p>Conclusions</p> <p>We identify thousands of genes that show sex-specific differences in overall gene expression levels, and identify hundreds of additional genes that have differences in the abundance of isoform-specific transcripts. No transcription unit structural feature was robustly enriched in the sex-differentially expressed transcript isoforms. Additionally, we found that many genes with male-biased expression were enriched on the × chromosome and reside adjacent to dosage compensation entry sites, suggesting that differences in sex chromosome composition contributes to dimorphism in gene expression. Taken together, this study provides new insight into the molecular underpinnings of sexual differentiation.</p

    Detecting microsatellites within genomes: significant variation among algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker).</p> <p>Results</p> <p>Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (<it>Saccharomyces cerevisiae</it>, <it>Neurospora crassa </it>and <it>Drosophila melanogaster</it>) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif.</p> <p>Conclusion</p> <p>Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.</p

    Refined physical map of the human PAX2/HOX11/NFKB2 cancer gene region at 10q24 and relocalization of the HPV6AI1 viral integration site to 14q13.3-q21.1

    Get PDF
    BACKGROUND: Chromosome band 10q24 is a gene-rich domain and host to a number of cancer, developmental, and neurological genes. Recurring translocations, deletions and mutations involving this chromosome band have been observed in different human cancers and other disease conditions, but the precise identification of breakpoint sites, and detailed characterization of the genetic basis and mechanisms which underlie many of these rearrangements has yet to be resolved. Towards this end it is vital to establish a definitive genetic map of this region, which to date has shown considerable volatility through time in published works of scientific journals, within different builds of the same international genomic database, and across the differently constructed databases. RESULTS: Using a combination of chromosome and interphase fluorescent in situ hybridization (FISH), BAC end-sequencing and genomic database analysis we present a physical map showing that the order and chromosomal orientation of selected genes within 10q24 is CEN-CYP2C9-PAX2-HOX11-NFKB2-TEL. Our analysis has resolved the orientation of an otherwise dynamically evolving assembly of larger contigs upstream of this region, and in so doing verifies the order and orientation of a further 9 cancer-related genes and GOT1. This study further shows that the previously reported human papillomavirus type 6a DNA integration site HPV6AI1 does not map to 10q24, but that it maps at the interface of chromosome bands 14q13.3-q21.1. CONCLUSIONS: This revised map will allow more precise localization of chromosome rearrangements involving chromosome band 10q24, and will serve as a useful baseline to better understand the molecular aetiology of chromosomal instability in this region. In particular, the relocation of HPV6AI1 is important to report because this HPV6a integration site, originally isolated from a tonsillar carcinoma, was shown to be rearranged in other HPV6a-related malignancies, including 2 of 25 genital condylomas, and 2 of 7 head and neck tumors tested. Our finding shifts the focus of this genomic interest from 10q24 to the chromosome 14 site

    Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities

    Get PDF
    The assumption that RNA can be readily classified into either protein-coding or non-protein–coding categories has pervaded biology for close to 50 years. Until recently, discrimination between these two categories was relatively straightforward: most transcripts were clearly identifiable as protein-coding messenger RNAs (mRNAs), and readily distinguished from the small number of well-characterized non-protein–coding RNAs (ncRNAs), such as transfer, ribosomal, and spliceosomal RNAs. Recent genome-wide studies have revealed the existence of thousands of noncoding transcripts, whose function and significance are unclear. The discovery of this hidden transcriptome and the implicit challenge it presents to our understanding of the expression and regulation of genetic information has made the need to distinguish between mRNAs and ncRNAs both more pressing and more complicated. In this Review, we consider the diverse strategies employed to discriminate between protein-coding and noncoding transcripts and the fundamental difficulties that are inherent in what may superficially appear to be a simple problem. Misannotations can also run in both directions: some ncRNAs may actually encode peptides, and some of those currently thought to do so may not. Moreover, recent studies have shown that some RNAs can function both as mRNAs and intrinsically as functional ncRNAs, which may be a relatively widespread phenomenon. We conclude that it is difficult to annotate an RNA unequivocally as protein-coding or noncoding, with overlapping protein-coding and noncoding transcripts further confounding this distinction. In addition, the finding that some transcripts can function both intrinsically at the RNA level and to encode proteins suggests a false dichotomy between mRNAs and ncRNAs. Therefore, the functionality of any transcript at the RNA level should not be discounted
    corecore